Take Home Ex 3: Financial Inclusion in Uganda: An Explanatory Study Using Geographically Weighted Regression

Author

Stephen Tay

Published

November 10, 2024

Modified

November 11, 2024

1. Overview

1.1 Introduction

Financial inclusion is a critical driver of economic growth, macroeconomic stability, and poverty reduction (Nguyen et al., 2021). Grounded in the Schumpeterian model, evidence from countries like Vietnam demonstrates how accessible financial services can empower individuals and businesses to invest, grow, and contribute to broader economic stability. When financial services are widely available, wealth distribution becomes more equitable across social groups. In contrast, when access is limited often to a select affluent population, only this group could grow their wealth, while economically disadvantaged households struggle to get the funding they need (Hamden et al., 2022; Kaliba, Bishagazi, & Gongwe, 2023).

In Uganda, approximately 76% of the population resides in rural areas, with agriculture as the main source of income (Hamden et al., 2022). Since the introduction of mobile money in 2009, over 80% of Ugandan adults have acquired a mobile money account, making it the most commonly used financial service (FinScope, 2024; Hamden et al., 2022). However, active account usage remains low, with only 49% of account holders using mobile money regularly (Hamden et al., 2022). While mobile phone ownership and internet access have grown substantially, significant gender and regional gaps persist, particularly affecting rural and marginalised communities such as females (FinScope, 2024). These disparities highlight the challenge of creating a fully inclusive financial system that reaches all demographic groups.

Although previous studies have explored factors influencing financial inclusion, they often take a generalised approach, overlooking geographical variations. This study addresses this gap by applying geographically weighted regression (GWR) to analyse the factors influencing financial inclusion in Uganda at the district level. By adopting this approach, this study aims to uncover district-level factors and patterns of financial inclusion, offering insights that can inform targeted policies and interventions.

1.2 Datasets

This study utilises two key datasets:

  • FinScope Uganda 2023 Survey Dataset: This aspatial dataset includes responses from 3,176 Ugandan adults (aged 16 and older), providing insights into attitudes and behaviors around money management, financial products, and services. Respondents were selected through a rigorous stratified sampling process to ensure representativeness.
  • Uganda District Boundaries (2020): Geographical boundary data obtained from geoBoundaries, detailing the administrative district boundaries across Uganda.

1.3 R Packages

The following R packages are loaded for this study:

pacman::p_load(olsrr, corrplot, ggpubr, sf, sfdep, GWmodel, tmap, tidyverse, gtsummary, ggstatsplot, performance, see, readxl)

2. Aspatial Data: Data Wrangling

The FinScope Uganda 2023 Survey Dataset was loaded using the read_excel() function. The analyst conducted literature review and preliminary analyses to identify relevant survey questions and variables for the study, ensuring that selected questions contain no more than 15% missing data. The chosen fields cover demographics, income, digital connectivity and literacy, financial literacy, and various measures of financial inclusion. To streamline analysis, the select() function was used to isolate these variables, which were then renamed for easier identification and interpretation in subsequent analyses.

fin_df <- read_excel("data/aspatial/FinScope-2023_Dataset_Final.xlsx", 
                     sheet="Final_Dataset") %>%
  select(c(id=Interview_ID, district = District,
           age, gender=c2, education=c4, household_size=n1_1,
           rural_urban=Rural_Urban, employment=c5, agribusiness=m6_1,
           income_source1=d2_2_11, income_source2=d2_2_12, income_source3=d2_2_13, 
           income1=d3_31, income2=d3_32, income3=d3_33,
           own_mobile_phone=c7_1_1, is_smartphone=c7_1_4, access_internet=c6_1_2, 
           literacy_mobile=c6_2_1, literacy_internet=c6_2_2,
           finliteracy_plan1=e5_11, finliteracy_plan2=e5_12, finliteracy_plan3=e5_13, 
           finliteracy_plan4=e5_14, finliteracy_plan5=e5_16,  
           finliteracy_save1=f1_1_1, finliteracy_save2=f1_1_3, finliteracy_save3=f1_1_4, 
           finliteracy_aware1=g1_2, finliteracy_aware2=h2_1_3,  finliteracy_aware3=h2_1_4, 
           finliteracy_aware4=h2_1_5, finliteracy_aware5=h2_1_8,  finliteracy_aware6=h2_1_9,
           finincl_risk=j1,
           finincl_save1=f20, finincl_save2=f3_1_1, finincl_save3=f3_1_2, 
           finincl_save4=f3_1_3, finincl_save5=f3_1_4, finincl_save6=f3_1_5, 
           finincl_save7=f3_1_6, finincl_save8=f3_1_8, finincl_save9=f3_1_9,
           finincl_remit1=hpp1_1, finincl_remit2=hpp4_1,
           finincl_pay1=hpb22, finincl_pay2=hpb23, finincl_pay3=hpb24, 
           finincl_pay4=hpb25, finincl_pay5=hpb26, 
           finincl_loan1=g14_1, finincl_loan2=g14_2, finincl_loan3=g14_3, 
           finincl_loan4=g4_1, finincl_loan5=km10_1))

head(fin_df)
# A tibble: 6 × 56
  id       district   age gender education household_size rural_urban employment
  <chr>    <chr>    <dbl>  <dbl>     <dbl>          <dbl> <chr>            <dbl>
1 00100102 ABIM        32      2         6              3 Urban                1
2 00101905 ABIM        37      2         2              3 Urban                5
3 00102802 ABIM        25      2         1              1 Urban                5
4 00103701 ABIM        32      1         2              2 Urban                1
5 00104001 ABIM        40      2         3              1 Urban                4
6 00104704 ABIM        16      1         2              2 Urban                9
# ℹ 48 more variables: agribusiness <dbl>, income_source1 <dbl>,
#   income_source2 <dbl>, income_source3 <dbl>, income1 <dbl>, income2 <dbl>,
#   income3 <dbl>, own_mobile_phone <dbl>, is_smartphone <dbl>,
#   access_internet <dbl>, literacy_mobile <dbl>, literacy_internet <dbl>,
#   finliteracy_plan1 <dbl>, finliteracy_plan2 <dbl>, finliteracy_plan3 <dbl>,
#   finliteracy_plan4 <dbl>, finliteracy_plan5 <dbl>, finliteracy_save1 <dbl>,
#   finliteracy_save2 <dbl>, finliteracy_save3 <dbl>, …

To prepare the data for analysis, the analyst conducted distribution checks and addressed missing data (included coded values like 999), ensuring proper data treatment. For brevity, preliminary analyses, including distribution analysis, are not shown in this report. However, all steps were carefully executed to ensure data readiness.

2.1 Demographics

Demographic variables in this study include:

  • Age: We created four age-band variables to capture meaningful life stages (16–24, 25–34, 35–44, and 45–54) for aggregation at the district level in subsequent regression analysis. This age-group structure represents the district-level age distribution while avoiding perfect multicollinearity by omitting the 55+ group. Each age group was coded using if_else(), tagging respondents within the group as 1, otherwise as 0.
  • Gender: Females were tagged as 1 and males as 0.
  • Education: Four education-level variables (primary, secondary, vocational, and degree) were created, with “no formal education” excluded to prevent multicollinearity.
  • Household Size: Households of five or more members were considered large and tagged as 1.
  • Rural/Urban: Respondents in rural areas were tagged as 1.
  • Employment Status: We created three variables: formal employment, self-employment, and unemployment. An additional variable, “non-working,” was used for data wrangling but excluded from regression analysis.
  • Agricultural Business: Involvement in such businesses were tagged as 1, otherwise as 0.

Data preparation involved mutate(), if_else(), and/or case_when() functions, replacing coded missing values (e.g., 999, 998) with NA_real_ to specify a numeric NA type. Variables that are no longer needed are removed using select() and -c() functions.

fin_df1 <- fin_df %>%
  mutate(age16_24 = if_else(age <= 24, 1, 0),
         age25_34 = if_else(age >= 25 & age <= 34, 1, 0),
         age35_44 = if_else(age >= 35 & age <= 44, 1, 0),
         age45_54 = if_else(age >= 45 & age <= 54, 1, 0)) %>%
  mutate(gender_female = if_else(gender == 2, 1, 0)) %>%
  mutate(education_pri = if_else(education %in% c(2,3), 1, 0),
         education_sec = if_else(education %in% c(4,5), 1, 0),
         education_voc = if_else(education %in% c(6,7), 1, 0),
         education_deg = if_else(education == 8, 1, 0)) %>%
  mutate(household_big = if_else(household_size %in% c(2,3), 1, 0)) %>%
  mutate(is_rural = if_else(rural_urban == "Rural", 1, 0)) %>%
  mutate(employment_formal = case_when(employment %in% c(3,4,6) ~ 1,
                                       employment == 99 ~ NA_real_,
                                       TRUE ~ 0),
         employment_self = case_when(employment %in% c(1,2) ~ 1,
                                     employment == 99 ~ NA_real_,
                                     TRUE ~ 0),
         employment_unemployed = case_when(employment == 7 ~ 1,
                                           employment == 99 ~ NA_real_,
                                           TRUE ~ 0),
         employment_nonworking = case_when(employment %in% c(7,5,8,9,10) ~ 1,
                                           employment == 99 ~ NA_real_,
                                           TRUE ~ 0)) %>%
  mutate(is_agribusiness = if_else(agribusiness == 1, 1, 0)) %>%
  select(-c(age, gender, education, household_size, rural_urban, employment, agribusiness))

2.2 Earned Income

To determine individuals’ earned income, the following steps were performed:

  • Handling Missing Data: income1, income2, and income3 represent the reported income levels of individuals. Values coded as missing in these three variables were replaced with NA_real_.
  • Filtering Earned Income: Earned income was considered only if the income source was employment-related (i.e., not from investments, social transfers, or gifts). If income was derived from these non-earned sources or the individual was not working, the income was set to 0.
  • Selecting Highest Income Bracket: Among the three earned income variables, the highest income bracket was chosen to represent earned income using pmax(). These values could not be summed as they are in income brackets rather than absolute amounts.
  • Categorising Income Levels: We created three earned income categories: low (up to UGX 250K per month), medium (up to UGX 1M per month), and high.
fin_df2 <- fin_df1 %>%
  mutate(income1 = if_else(income1 %in% c(8,9,99,997,998), NA_real_, income1),
         income2 = if_else(income2 %in% c(8,9,99,997,998), NA_real_, income2),
         income3 = if_else(income3 %in% c(8,9,99,997,998), NA_real_, income3),
         earned_income1 = case_when(income_source1 %in% c(5,6,7,8,9,10,11) ~ 0,
                                    income_source1 %in% c(1,2,3,4) ~ income1,
                                    employment_nonworking == 1 ~ 0,
                                    TRUE ~ NA_real_),
         earned_income2 = case_when(income_source2 %in% c(5,6,7,8,9,10,11) ~ 0,
                                    income_source2 %in% c(1,2,3,4) ~ income2,
                                    employment_nonworking == 1 ~ 0,
                                    TRUE ~ NA_real_),
         earned_income3 = case_when(income_source3 %in% c(5,6,7,8,9,10,11) ~ 0,
                                    income_source3 %in% c(1,2,3,4) ~ income3,
                                    employment_nonworking == 1 ~ 0,
                                    TRUE ~ NA_real_),
         earned_income = pmax(earned_income1, earned_income2, earned_income3)) %>%
  mutate(earned_low = case_when(earned_income %in% c(1,2) ~ 1,
                                is.na(earned_income) ~ NA_real_,
                                TRUE ~ 0),
         earned_med = case_when(earned_income %in% c(3,4) ~ 1,
                                is.na(earned_income) ~ NA_real_,
                                TRUE ~ 0),
         earned_high = case_when(earned_income %in% c(5,6,7) ~ 1,
                                is.na(earned_income) ~ NA_real_,
                                TRUE ~ 0))

Additionally, we created a variable, income_source_cnt, to capture the number of income sources (earned, investment, social, and gift). Using the rowwise() function, we computed the total income sources for each individual, followed by ungroup() to reset the data frame and remove unnecessary variables.

fin_df2 <- fin_df2 %>%
  rowwise() %>%
  mutate(income_source_cnt = sum(!income_source1 %in% c(10,11),
                                 !income_source2 %in% c(10,11),
                                 !income_source3 %in% c(10,11))) %>%
  ungroup() %>%
  select(-c(income1, income2, income3, earned_income,
            earned_income1, earned_income2, earned_income3,
            income_source1, income_source2, income_source3,
            employment_nonworking))

2.3 Digital Connectivity & Literacy

For digital connectivity and literacy, we considered three variables:

  • Mobile Ownership: Tagged as 1 if the respondent owns a mobile phone (either smartphone or feature phone), otherwise 0.
  • Internet Access: Tagged as 1 if the respondent has internet access, otherwise 0.
  • Digital Literacy: Calculated as the sum of respondents’ comfort with using mobile phones and the internet, with 1 point assigned for each if literacy_mobile == 1 and literacy_internet == 1.

In the code, we used relocate() to position these variables as the last columns in the data frame. To compute digital literacy, rowwise() was applied before calculating the combined score for each individual. Finally, we removed variables that were no longer needed.

fin_df3 <- fin_df2 %>% 
  mutate(own_mobile_phone = if_else(own_mobile_phone == 1, 1, 0)) %>%
  mutate(access_internet = if_else(access_internet == 1, 1, 0)) %>%
  relocate(c(own_mobile_phone, access_internet), .after = last_col()) %>%
  rowwise() %>%
  mutate(digital_literacy = sum(literacy_mobile == 1,literacy_internet == 1)) %>%
  ungroup() %>%
  select(-c(is_smartphone, literacy_mobile, literacy_internet))

2.4.1 Financial Literacy (Planning/Budgeting)

Financial literacy is multifaceted and encompasses several dimensions. In this study, we assessed financial literacy through three key aspects, calculating a score for each:

  • Financial Planning and Budgeting
  • Saving Behaviours
  • Awareness of Financial Products

For Financial Planning and Budgeting, we computed a composite mean score for each individual based on their responses to the following five survey questions:

  • You keep track of the money that you receive and spend
  • You know how much money you spent last week
  • You adjust your expenses according to the money you have available
  • You make a plan or budget to manage your income and expenses
  • I set long term financial goals and try to achieve them

The five questions were recoded to binary values (1 for positive responses and 0 otherwise). To compute the composite mean score, rowwise() was applied, allowing the mean score to be computed for each individual. We removed variables that were no longer needed.

fin_df4 <- fin_df3 %>%
  mutate(finliteracy_plan1 = if_else(finliteracy_plan1 == 1, 1, 0),
         finliteracy_plan2 = if_else(finliteracy_plan2 == 1, 1, 0),
         finliteracy_plan3 = if_else(finliteracy_plan3 == 1, 1, 0),
         finliteracy_plan4 = if_else(finliteracy_plan4 == 1, 1, 0),
         finliteracy_plan5 = if_else(finliteracy_plan5 == 1, 1, 0)) %>%
  rowwise() %>%
  mutate(finliteracy_plan = mean(c(finliteracy_plan1, finliteracy_plan2, 
                                   finliteracy_plan3, finliteracy_plan4,
                                   finliteracy_plan5))) %>%
  ungroup() %>%
  select(-c(finliteracy_plan1, finliteracy_plan2, finliteracy_plan3, 
            finliteracy_plan4, finliteracy_plan5))

2.4.2 Financial Literacy (Saving Behaviours)

For Saving Behaviours, we computed a composite mean score for each individual based on their responses to the following three survey questions:

  • You sometimes do not buy things that you want so that you save money instead
  • You get information about different ways of savings before you decide where/how to save
  • You try different savings options to find the one where you can get the most interest.

The three questions were recoded to binary values (1 for positive responses and 0 otherwise). To compute the composite mean score, rowwise() was applied, allowing the mean score to be computed for each individual. We removed variables that were no longer needed.

fin_df4 <- fin_df4 %>%
  mutate(finliteracy_save1 = if_else(finliteracy_save1 == 1, 1, 0),
         finliteracy_save2 = if_else(finliteracy_save2 == 1, 1, 0),
         finliteracy_save3 = if_else(finliteracy_save3 == 1, 1, 0)) %>%
  rowwise() %>%
  mutate(finliteracy_save = mean(c(finliteracy_save1, finliteracy_save2, finliteracy_save3))) %>%
  ungroup() %>%
  select(-c(finliteracy_save1, finliteracy_save2, finliteracy_save3))

2.4.3 Financial Literacy (Awareness of Financial Products)

For Awareness of Financial Products, we computed a composite mean score for each individual based on their awareness of the following common financial products:

  • Digital loans
  • Debit Cards
  • Credit Cards
  • Mobile or Internet Banking
  • Mobile Money wallets or E-money wallets
  • Remittance Channels, e.g., MoneyGram, Western Union

The six questions were recoded to binary values (1 for positive responses and 0 otherwise). To compute the composite mean score, rowwise() was applied, allowing the mean score to be computed for each individual. We removed variables that were no longer needed.

fin_df4 <- fin_df4 %>%
  mutate(finliteracy_aware1 = if_else(finliteracy_aware1 == 1, 1, 0),
         finliteracy_aware2 = if_else(finliteracy_aware2 == 1, 1, 0),
         finliteracy_aware3 = if_else(finliteracy_aware3 == 1, 1, 0),
         finliteracy_aware4 = if_else(finliteracy_aware4 == 1, 1, 0),
         finliteracy_aware5 = if_else(finliteracy_aware5 == 1, 1, 0),
         finliteracy_aware6 = if_else(finliteracy_aware6 == 1, 1, 0)) %>%
  rowwise() %>%
  mutate(finliteracy_aware = mean(c(finliteracy_aware1, finliteracy_aware2, 
                                    finliteracy_aware3, finliteracy_aware4,
                                    finliteracy_aware5, finliteracy_aware6))) %>%
  ungroup() %>%
  select(-c(finliteracy_aware1, finliteracy_aware2, finliteracy_aware3, 
            finliteracy_aware4, finliteracy_aware5, finliteracy_aware6))

2.5.1 Financial Inclusion (Insurance Products)

We followed the methodology used by Nguyen et al. (2021) to calculate a composite score for financial inclusion, focusing on aspects aligned with its core definition:

  • Access to Insurance Products
  • Access to Common Savings Mechanisms
  • Access to Remittance Services
  • Access to Common Payment Channels
  • Credit Access

Each respondent could receive a maximum score of 1 for having access to insurance, remittance, or credit products/services. For savings and payments, a maximum score of 2 was assigned, reflecting their centrality in everyday financial activity. This higher weighting acknowledges their frequent usage in daily transactions.

For insurance products, the assessment relied on a single question:

  • Do you have any existing insurance policy?

Responses were recoded to binary values (1 for positive responses, 0 otherwise) and relocated to the last column of the dataset.

fin_df5 <- fin_df4 %>%
  mutate(finincl_risk = if_else(finincl_risk == 1, 1, 0)) %>%
  relocate(finincl_risk, .after = last_col()) 

2.5.2 Financial Inclusion (Savings Mechanisms)

For Savings Mechanism, we computed a composite score (maximum 2 points) for each individual based on responses to the following common saving mechanisms:

  • Have you ever saved electronically?

Saved in the last 12 months in… - Commercial Bank - Credit Institution - MDI - Savings and credit cooperatives (SACCOs) including shares - Microfinance Institutions - Mobile money - Savings group (VSLA, ROSCA) - Investment club

Each question was recoded to binary values (1 for positive responses and 0 otherwise). To compute the composite score, rowwise() was applied to sum the responses for each individual. We used pmin() to cap the total at 2 points. Variables no longer needed were removed.

fin_df5 <- fin_df5 %>%
  mutate(finincl_save1 = if_else(finincl_save1 == 1, 1, 0),
         finincl_save2 = case_when(finincl_save2 == 1 ~ 1,
                                   TRUE ~ 0),
         finincl_save3 = case_when(finincl_save3 == 1 ~ 1,
                                   TRUE ~ 0),
         finincl_save4 = case_when(finincl_save4 == 1 ~ 1,
                                   TRUE ~ 0),
         finincl_save5 = case_when(finincl_save5 == 1 ~ 1,
                                   TRUE ~ 0),
         finincl_save6 = case_when(finincl_save6 == 1 ~ 1,
                                   TRUE ~ 0),
         finincl_save7 = case_when(finincl_save7 == 1 ~ 1,
                                   TRUE ~ 0),
         finincl_save8 = case_when(finincl_save8 == 1 ~ 1,
                                   TRUE ~ 0),
         finincl_save9 = case_when(finincl_save9 == 1 ~ 1,
                                   TRUE ~ 0)) %>%
  rowwise() %>%
  mutate(finincl_save = sum(finincl_save1, finincl_save2, finincl_save3,
                            finincl_save4, finincl_save5, finincl_save6,
                            finincl_save7, finincl_save8, finincl_save9),
         finincl_save = pmin(finincl_save, 2)) %>%
  ungroup() %>%
  select(-c(finincl_save1, finincl_save2, finincl_save3,
            finincl_save4, finincl_save5, finincl_save6,
            finincl_save7, finincl_save8, finincl_save9))

2.5.3 Financial Inclusion (Remittance Services)

For Remittance Services, we computed a composite score (maximum 1 point) for each individual based on responses to the following 2 questions:

  • In the past 12 months, have you sent money to someone in a different place within the country or outside of Uganda?
  • In the past 12 months, have you received money from someone in a different place within the country or from outside the country?

Each question was recoded to binary values (1 for positive responses and 0 otherwise). We used pmax() to cap the total at 1 point. Variables no longer needed were removed.

fin_df5 <- fin_df5 %>%
  mutate(finincl_remit1 = if_else(finincl_remit1 == 1, 1, 0),
         finincl_remit2 = if_else(finincl_remit2 == 1, 1, 0),
         finincl_remit = pmax(finincl_remit1, finincl_remit2)) %>%
  select(-c(finincl_remit1, finincl_remit2))

2.5.4 Financial Inclusion (Payment Channels)

For Payment Channels, we computed a composite score (maximum 2 points) for each individual based on their usage to the following 5 common payment channels in the last 12 months:

  • ATM / Debit card
  • Credit card
  • Bank transfer
  • Mobile money
  • Cheque

Each question was recoded to binary values (1 for positive responses and 0 otherwise). To compute the composite score, rowwise() was applied to sum the responses for each individual. We used pmin() to cap the total at 2 points. Variables no longer needed were removed.

fin_df5 <- fin_df5 %>%
  mutate(finincl_pay1 = if_else(finincl_pay1 != 5, 1, 0),
         finincl_pay2 = if_else(finincl_pay2 != 5, 1, 0),
         finincl_pay3 = if_else(finincl_pay3 != 5, 1, 0),
         finincl_pay4 = if_else(finincl_pay4 != 5, 1, 0),
         finincl_pay5 = if_else(finincl_pay5 != 5, 1, 0)) %>%
  rowwise() %>%
  mutate(finincl_pay = sum(finincl_pay1, finincl_pay2, finincl_pay3,
                           finincl_pay4, finincl_pay5),
         finincl_pay = pmin(finincl_pay, 2)) %>%
  ungroup() %>%
  select(-c(finincl_pay1, finincl_pay2, finincl_pay3,
            finincl_pay4, finincl_pay5))

2.5.4 Financial Inclusion (Credit)

For Credits, we computed a composite score (maximum 1 point) for each individual based on responses to the following 5 questions:

  • Have you ever applied for a loan electronically?
  • Have you ever received a loan disbursement/pay-out electronically?
  • Have you made a loan payment electronically?
  • Have you, in the past 12 months, been paying back money that you borrowed (e.g. mortgage, Boda loan etc) from anybody or any institution?
  • Have you ever borrowed money through mobile money services?

Each question was recoded to binary values (1 for positive responses and 0 otherwise). We used pmax() to cap the total at 1 point. Variables no longer needed were removed.

fin_df5 <- fin_df5 %>%
  mutate(finincl_loan1 = if_else(finincl_loan1 == 1, 1, 0),
         finincl_loan2 = if_else(finincl_loan2 == 1, 1, 0),
         finincl_loan3 = if_else(finincl_loan3 == 1, 1, 0),
         finincl_loan4 = if_else(finincl_loan4 == 1, 1, 0),
         finincl_loan5 = if_else(finincl_loan5 == 1, 1, 0),
         finincl_loan = pmax(finincl_loan1, finincl_loan2, finincl_loan3,
                             finincl_loan4, finincl_loan5)) %>%
  select(-c(finincl_loan1, finincl_loan2, finincl_loan3,
                             finincl_loan4, finincl_loan5))

2.5.6 Total Financial Inclusion

The total score for financial inclusion was a summation of their scores to:

  • Access to Insurance Products (max 1 point)
  • Access to Common Savings Mechanisms (max 2 points)
  • Access to Remittance Services (max 1 point)
  • Access to Common Payment Channels (max 2 points)
  • Credit Access (max 1 point)
fin_df5 <- fin_df5 %>%
  rowwise() %>%
  mutate(fin_inclusion = sum(finincl_risk, finincl_save, finincl_remit,
                             finincl_pay, finincl_loan)) %>%
  ungroup()
head(fin_df5)
# A tibble: 6 × 33
  id    district age16_24 age25_34 age35_44 age45_54 gender_female education_pri
  <chr> <chr>       <dbl>    <dbl>    <dbl>    <dbl>         <dbl>         <dbl>
1 0010… ABIM            0        1        0        0             1             0
2 0010… ABIM            0        0        1        0             1             1
3 0010… ABIM            0        1        0        0             1             0
4 0010… ABIM            0        1        0        0             0             1
5 0010… ABIM            0        0        1        0             1             1
6 0010… ABIM            1        0        0        0             0             1
# ℹ 25 more variables: education_sec <dbl>, education_voc <dbl>,
#   education_deg <dbl>, household_big <dbl>, is_rural <dbl>,
#   employment_formal <dbl>, employment_self <dbl>,
#   employment_unemployed <dbl>, is_agribusiness <dbl>, earned_low <dbl>,
#   earned_med <dbl>, earned_high <dbl>, income_source_cnt <int>,
#   own_mobile_phone <dbl>, access_internet <dbl>, digital_literacy <int>,
#   finliteracy_plan <dbl>, finliteracy_save <dbl>, finliteracy_aware <dbl>, …

2.6 Checking Missing Data

Before aggregating scores at the district level, we examined the percentage of missing data across variables. Employment-related variables have 0.6% missing data, while earned income variables have 12.7%. All other variables are complete. The level of missing data is considered acceptable, and cases with missing values were retained. During district-level aggregation for employment and earned income, missing values were ignored in the calculation of the mean, effectively imputing the mean for these cases.

colMeans(is.na(fin_df5))
                   id              district              age16_24 
          0.000000000           0.000000000           0.000000000 
             age25_34              age35_44              age45_54 
          0.000000000           0.000000000           0.000000000 
        gender_female         education_pri         education_sec 
          0.000000000           0.000000000           0.000000000 
        education_voc         education_deg         household_big 
          0.000000000           0.000000000           0.000000000 
             is_rural     employment_formal       employment_self 
          0.000000000           0.006612091           0.006612091 
employment_unemployed       is_agribusiness            earned_low 
          0.006612091           0.000000000           0.127204030 
           earned_med           earned_high     income_source_cnt 
          0.127204030           0.127204030           0.000000000 
     own_mobile_phone       access_internet      digital_literacy 
          0.000000000           0.000000000           0.000000000 
     finliteracy_plan      finliteracy_save     finliteracy_aware 
          0.000000000           0.000000000           0.000000000 
         finincl_risk          finincl_save         finincl_remit 
          0.000000000           0.000000000           0.000000000 
          finincl_pay          finincl_loan         fin_inclusion 
          0.000000000           0.000000000           0.000000000 

2.7 Aggregation to District Level

In the final step of data wrangling, we convert district names to sentence case using str_to_sentence() function.

Next, we use group_by() and summarise() to aggregate scores at the district level. For each metric, mean() is applied to calculate the average score or the proportion of cases with the specified attribute. The additional argument na.rm = TRUE is included to ignore missing values in the calculation, effectively imputing the mean for these cases.

fin_df6 <- fin_df5 %>%
  mutate(district = str_to_sentence(district)) %>%
  group_by(district) %>%
  summarise(fin_inclusion = mean(fin_inclusion),
            age16_24 = mean(age16_24),
            age25_34 = mean(age25_34),
            age35_44 = mean(age35_44),
            age45_54 = mean(age45_54),
            gender_female = mean(gender_female),
            education_pri = mean(education_pri),
            education_sec = mean(education_sec),
            education_voc = mean(education_voc),
            education_deg = mean(education_deg),
            household_big = mean(household_big),
            is_rural = mean(is_rural),
            employment_formal = mean(employment_formal, na.rm = TRUE),
            employment_self = mean(employment_self, na.rm = TRUE),
            employment_unemployed = mean(employment_unemployed, na.rm = TRUE),
            is_agribusiness = mean(is_agribusiness),
            earned_low = mean(earned_low, na.rm = TRUE),
            earned_med = mean(earned_med, na.rm = TRUE),
            earned_high = mean(earned_high, na.rm = TRUE),
            income_source_cnt = mean(income_source_cnt),
            own_mobile_phone = mean(own_mobile_phone),
            access_internet = mean(access_internet),
            digital_literacy = mean(digital_literacy),
            finliteracy_plan = mean(finliteracy_plan),
            finliteracy_save = mean(finliteracy_save),
            finliteracy_aware = mean(finliteracy_aware)) %>%
  ungroup()
head(fin_df6)
# A tibble: 6 × 27
  district fin_inclusion age16_24 age25_34 age35_44 age45_54 gender_female
  <chr>            <dbl>    <dbl>    <dbl>    <dbl>    <dbl>         <dbl>
1 Abim              2.2     0.15     0.45     0.15    0.15           0.7  
2 Adjumani          3.33    0.233    0.367    0.2     0.1            0.567
3 Agago             2.8     0.333    0.133    0.233   0.0333         0.433
4 Alebtong          1.35    0.3      0.15     0.2     0.1            0.65 
5 Amolatar          2.93    0.133    0.233    0.267   0.267          0.667
6 Amudat            1.2     0.45     0.3      0.2     0              0.6  
# ℹ 20 more variables: education_pri <dbl>, education_sec <dbl>,
#   education_voc <dbl>, education_deg <dbl>, household_big <dbl>,
#   is_rural <dbl>, employment_formal <dbl>, employment_self <dbl>,
#   employment_unemployed <dbl>, is_agribusiness <dbl>, earned_low <dbl>,
#   earned_med <dbl>, earned_high <dbl>, income_source_cnt <dbl>,
#   own_mobile_phone <dbl>, access_internet <dbl>, digital_literacy <dbl>,
#   finliteracy_plan <dbl>, finliteracy_save <dbl>, finliteracy_aware <dbl>

3. Geospatial data: Importing & Data Wrangling

Uganda District Boundaries was imported using st_read(). It contains multipolygon features in the WGS 84 coordinates system. We used st_transform() to convert it to a projected coordinate system with EPSG: 21096.

uga_district <- st_read(dsn = "data/geospatial",
                        layer = "geoBoundaries-UGA-ADM3") %>%
  st_transform(21096)
Reading layer `geoBoundaries-UGA-ADM3' from data source 
  `/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/Take-home_Ex/Take-home_Ex03/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 137 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 29.58004 ymin: -1.473149 xmax: 34.99872 ymax: 4.215767
Geodetic CRS:  WGS 84

We check the coordinate system using st_crs().

st_crs(uga_district)
Coordinate Reference System:
  User input: EPSG:21096 
  wkt:
PROJCRS["Arc 1960 / UTM zone 36N",
    BASEGEOGCRS["Arc 1960",
        DATUM["Arc 1960",
            ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4210]],
    CONVERSION["UTM zone 36N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Kenya - north of equator and west of 36°E; Uganda - north of equator and east of 30°E."],
        BBOX[0,29.99,4.63,36]],
    ID["EPSG",21096]]

The map reveals that some districts contain multiple islands, which may pose challenges when calculating each district’s centroid. Specifically, we want to avoid centroids that fall outside district boundaries, as this could impact subsequent analyses.

tmap_mode('plot')
tm_shape(uga_district) +
  tm_borders()

We begin by selecting the required variables using select(), at the same time renaming “shapeName” to “district”. Next, we use st_cast() to convert each multipolygon feature to individual polygon features and apply st_area() to calculate the area of each polygon.

We then group by district and filter for the polygon with the largest surface area to represent the district. This step essentially removed smaller islands from the district. The area variable is removed as it is no longer needed.

Finally, we compute the centroid for each district using the st_point_on_surface() function.

polygon_area <- uga_district %>%
  select(district = shapeName, geometry) %>%
  st_cast("POLYGON") %>%
  mutate(area = st_area(.))

uga_district2 <- polygon_area %>%
  group_by(district) %>%
  filter(area == max(area)) %>%
  ungroup() %>%
  select(-c(area)) %>%
  mutate(centroid = st_point_on_surface(geometry))
glimpse(uga_district2)
Rows: 137
Columns: 3
$ district <chr> "Bugiri", "Buvuma", "Mukono", "Kalanga", "Mayuge", "Namayingo…
$ geometry <POLYGON [m]> POLYGON ((606271.3 63711.25..., POLYGON ((541850.3 27…
$ centroid <POINT [m]> POINT (585352.6 58716.74), POINT (532387.5 23241.58), P…

When we examined the map, we see that each centroid falls within each polygon.

tmap_mode('view')
tm_shape(uga_district2) +
  tm_borders() +
  tm_shape(st_sf(geometry = uga_district2$centroid)) +
  tm_dots(col = "red") +
  tm_view(set.zoom.limits = c(6,9))

In this step, we identify districts present in the FinScope dataset but missing from the GeoBoundaries data using the anti_join() function. The output reveals that ‘Rubirizi’ is in the FinScope dataset but not in GeoBoundaries due to a misspelling in the latter.

unmatched1 <- fin_df6 %>% 
  anti_join(uga_district2, by = "district") %>%
  arrange(district) %>%
  select(district)
unmatched1
# A tibble: 1 × 1
  district
  <chr>   
1 Rubirizi

Next, we identify districts present in the GeoBoundaries data but missing from the FinScopre dataset using the anti_join() function. The output reveals that some districts were not surveyed by FinScope.

unmatched2 <- uga_district2 %>% 
  st_drop_geometry() %>%
  anti_join(fin_df6, by = "district") %>%
  arrange(district) %>%
  select(district) %>%
  mutate(`Unmatched Districts` = "Unmatched")
unmatched2
# A tibble: 15 × 2
   district      `Unmatched Districts`
   <chr>         <chr>                
 1 Bukwo         Unmatched            
 2 Butambala     Unmatched            
 3 Kalaki        Unmatched            
 4 Kalanga       Unmatched            
 5 Karenga       Unmatched            
 6 Kazo          Unmatched            
 7 Kitagwenga    Unmatched            
 8 Lake Victoria Unmatched            
 9 Madi Okollo   Unmatched            
10 Nakasongola   Unmatched            
11 Ntoroko       Unmatched            
12 Obongi        Unmatched            
13 Rubirzi       Unmatched            
14 Rwampara      Unmatched            
15 Terego        Unmatched            

In this step, we corrected the spelling of “Rubirizi” district in the geospatial dataset, and we also joined the list of unmatched districts and labelled districts that were matched or not matched.

uga_district3 <- uga_district2 %>%
  mutate(district = case_when(district == "Rubirzi" ~ "Rubirizi",
                              TRUE ~ district)) %>%
  left_join(unmatched2, by = 'district') %>%
  mutate(`Unmatched Districts` = factor(case_when(`Unmatched Districts`=="Unmatched" ~ "Unmatched",
                                                        TRUE ~ "Matched")))

The map below shows the 14 steps (coloured red) that are not surveyed. Since these districts are unrepresented, they would be removed in the next step.

tmap_mode('plot')
tm_shape(uga_district3) +
  tm_polygons("Unmatched Districts", palette = c("lightgrey", "red")) +  
  tm_shape(uga_district3 %>% filter(`Unmatched Districts` == "Unmatched")) +
  tm_text("district", size = 0.6, col = "black", 
          remove.overlap = TRUE) +
  tm_layout(legend.position = c("left", "top"))

tmap_mode('plot')

We retained matched districts using filter() and removed the ‘Unmatched Districts’ variable, as it was no longer needed. We then performed a left join to merge the aggregated FinScope dataset with the geospatial data.

uga_fin_sf <- uga_district3 %>%
  filter(`Unmatched Districts` == "Matched") %>%
  select(-c(`Unmatched Districts`)) %>%
  left_join(fin_df6, by = 'district')
glimpse(uga_fin_sf)
Rows: 123
Columns: 29
$ district              <chr> "Bugiri", "Buvuma", "Mukono", "Mayuge", "Namayin…
$ geometry              <POLYGON [m]> POLYGON ((606271.3 63711.25..., POLYGON …
$ centroid              <POINT [m]> POINT (585352.6 58716.74), POINT (532387.5…
$ fin_inclusion         <dbl> 1.866667, 2.400000, 3.360000, 1.700000, 2.800000…
$ age16_24              <dbl> 0.3000000, 0.0000000, 0.2800000, 0.1750000, 0.15…
$ age25_34              <dbl> 0.1666667, 0.2000000, 0.3800000, 0.2500000, 0.20…
$ age35_44              <dbl> 0.2666667, 0.3000000, 0.0600000, 0.2250000, 0.30…
$ age45_54              <dbl> 0.20000000, 0.40000000, 0.14000000, 0.10000000, …
$ gender_female         <dbl> 0.7000000, 0.5000000, 0.5600000, 0.6000000, 0.55…
$ education_pri         <dbl> 0.5333333, 0.4000000, 0.4200000, 0.6750000, 0.70…
$ education_sec         <dbl> 0.2666667, 0.2000000, 0.3800000, 0.1000000, 0.20…
$ education_voc         <dbl> 0.03333333, 0.00000000, 0.14000000, 0.02500000, …
$ education_deg         <dbl> 0.00000000, 0.00000000, 0.02000000, 0.00000000, …
$ household_big         <dbl> 0.4666667, 0.3000000, 0.2000000, 0.2250000, 0.35…
$ is_rural              <dbl> 0.6666667, 1.0000000, 0.6000000, 0.7500000, 1.00…
$ employment_formal     <dbl> 0.00000000, 0.00000000, 0.10000000, 0.00000000, …
$ employment_self       <dbl> 0.4827586, 0.7777778, 0.6600000, 0.5641026, 0.90…
$ employment_unemployed <dbl> 0.06896552, 0.00000000, 0.12000000, 0.07692308, …
$ is_agribusiness       <dbl> 0.3333333, 0.4000000, 0.4000000, 0.3000000, 0.80…
$ earned_low            <dbl> 0.4230769, 0.6250000, 0.4897959, 0.5714286, 0.63…
$ earned_med            <dbl> 0.1538462, 0.2500000, 0.2244898, 0.1071429, 0.26…
$ earned_high           <dbl> 0.00000000, 0.00000000, 0.04081633, 0.00000000, …
$ income_source_cnt     <dbl> 1.366667, 1.700000, 1.920000, 1.225000, 1.700000…
$ own_mobile_phone      <dbl> 0.7333333, 0.8000000, 0.8800000, 0.6000000, 0.75…
$ access_internet       <dbl> 0.06666667, 0.10000000, 0.44000000, 0.07500000, …
$ digital_literacy      <dbl> 0.8333333, 0.9000000, 1.4600000, 0.7000000, 1.10…
$ finliteracy_plan      <dbl> 0.4000000, 0.3600000, 0.7600000, 0.3150000, 0.67…
$ finliteracy_save      <dbl> 0.3444444, 0.4333333, 0.5466667, 0.4666667, 0.71…
$ finliteracy_aware     <dbl> 0.16666667, 0.21666667, 0.55333333, 0.12500000, …

4. Exploratory Data Analysis

Multiple histograms of the 25 variables could be plotted using ggarrange() from ggpubr package.

Code
age16_24 <- ggplot(data = uga_fin_sf, aes(x = age16_24)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

age25_34 <- ggplot(data = uga_fin_sf, aes(x = age25_34)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

age35_44 <- ggplot(data = uga_fin_sf, aes(x = age35_44)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

age45_54 <- ggplot(data = uga_fin_sf, aes(x = age45_54)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

gender_female <- ggplot(data = uga_fin_sf, aes(x = gender_female)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

education_pri <- ggplot(data = uga_fin_sf, aes(x = education_pri)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

education_sec <- ggplot(data = uga_fin_sf, aes(x = education_sec)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

education_voc <- ggplot(data = uga_fin_sf, aes(x = education_voc)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

education_deg <- ggplot(data = uga_fin_sf, aes(x = education_deg)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

household_big <- ggplot(data = uga_fin_sf, aes(x = household_big)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

is_rural <- ggplot(data = uga_fin_sf, aes(x = is_rural)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

employment_formal <- ggplot(data = uga_fin_sf, aes(x = employment_formal)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

employment_self <- ggplot(data = uga_fin_sf, aes(x = employment_self)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

employment_unemployed <- ggplot(data = uga_fin_sf, aes(x = employment_unemployed)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

is_agribusiness <- ggplot(data = uga_fin_sf, aes(x = is_agribusiness)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

earned_low <- ggplot(data = uga_fin_sf, aes(x = earned_low)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

earned_med <- ggplot(data = uga_fin_sf, aes(x = earned_med)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

earned_high <- ggplot(data = uga_fin_sf, aes(x = earned_high)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

income_source_cnt <- ggplot(data = uga_fin_sf, aes(x = income_source_cnt)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

own_mobile_phone <- ggplot(data = uga_fin_sf, aes(x = own_mobile_phone)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

access_internet <- ggplot(data = uga_fin_sf, aes(x = access_internet)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

digital_literacy <- ggplot(data = uga_fin_sf, aes(x = digital_literacy)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

finliteracy_plan <- ggplot(data = uga_fin_sf, aes(x = finliteracy_plan)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

finliteracy_save <- ggplot(data = uga_fin_sf, aes(x = finliteracy_save)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

finliteracy_aware <- ggplot(data = uga_fin_sf, aes(x = finliteracy_aware)) + 
  geom_histogram(bins=20, color="black", fill="light blue")

ggarrange(age16_24, age25_34, age35_44, age45_54, gender_female, education_pri, 
          education_sec, education_voc, education_deg, household_big, is_rural, 
          employment_formal, employment_self, employment_unemployed, is_agribusiness, 
          earned_low, earned_med, earned_high, income_source_cnt, own_mobile_phone, 
          access_internet, digital_literacy, finliteracy_plan, finliteracy_save, 
          finliteracy_aware, ncol = 4, nrow = 7)

The map illustrates the geospatial distribution of financial inclusion across Uganda, highlighting disparities in financial inclusion levels among districts.

Code
tmap_mode("view")
tm_shape(uga_fin_sf) +
  tm_polygons(alpha = 0.4) +
  tm_shape(uga_fin_sf) +  
  tm_dots(col = "fin_inclusion",
          alpha = 0.6,
          style="quantile",
          title = "Financial Inclusion") +
  tm_view(set.zoom.limits = c(6,9))

To prevent multicollinearity in multiple linear regression, we examined correlations between variables to identify those with high interdependence. Some variables, such as mobile phone ownership, internet access, and digital literacy, showed high correlations. While these variables are not removed at this stage, they will be closely monitored during subsequent multicollinearity testing.

A visualization of the correlations among variables was created using corrplot.

corrplot(cor(fin_df6[, 2:27]))

An alternative visualization of the correlations was generated using the ggcorrmat() function from the ggstatsplot package, displaying both the magnitude and statistical significance of each correlation.

ggcorrmat(fin_df6[, 2:27])

5. Multiple Linear Regression Model (Non-spatial model)

5.1 Initial MLR Model

We built an initial multiple linear regression (MLR) model with financial inclusion as the dependent variable, regressing it on all explanatory variables using the lm() function.

The model summary was generated using tbl_regression() from the gtsummary package. The model achieved an adjusted R² of 0.835 and was statistically significant (p-value < .05). However, when examining the statistical significance of individual independent variables, we found that while some variables were significant (e.g., proportion aged 25–34, proportion involved in agricultural businesses), others were not (e.g., employment-related variables, proportion with large households).

finincl_mlr1 <- lm(fin_inclusion ~ age16_24 + age25_34 + age35_44 + age45_54 +
                    gender_female + education_pri + education_sec + education_voc +
                    education_deg + household_big + is_rural + employment_formal +
                    employment_formal + employment_self + employment_unemployed +
                    is_agribusiness + earned_low + earned_med + earned_high +
                    income_source_cnt + own_mobile_phone +
                    access_internet + digital_literacy +
                    finliteracy_plan + finliteracy_save + finliteracy_aware,
                data = uga_fin_sf)

tbl_regression(finincl_mlr1, intercept = FALSE) %>% 
  add_glance_source_note(label = list(sigma ~ "\U03C3"),
                         include = c(r.squared, adj.r.squared, 
                                     AIC, statistic,
                                     p.value, sigma))
Characteristic Beta 95% CI1 p-value
age16_24 0.25 -0.57, 1.1 0.5
age25_34 0.77 0.05, 1.5 0.036
age35_44 1.4 0.54, 2.3 0.002
age45_54 -0.51 -1.4, 0.38 0.3
gender_female -0.18 -0.74, 0.39 0.5
education_pri -0.01 -0.58, 0.56 >0.9
education_sec 0.61 -0.13, 1.4 0.11
education_voc -0.16 -1.2, 0.92 0.8
education_deg -1.4 -3.3, 0.45 0.13
household_big 0.06 -0.31, 0.44 0.7
is_rural -0.16 -0.45, 0.13 0.3
employment_formal -0.28 -1.5, 0.96 0.7
employment_self -0.19 -0.81, 0.42 0.5
employment_unemployed -0.70 -1.9, 0.53 0.3
is_agribusiness 0.46 0.13, 0.79 0.007
earned_low 0.42 -0.24, 1.1 0.2
earned_med 1.2 0.38, 1.9 0.004
earned_high 1.4 0.20, 2.6 0.022
income_source_cnt 0.04 -0.20, 0.27 0.8
own_mobile_phone 0.80 -0.11, 1.7 0.085
access_internet 0.43 -0.62, 1.5 0.4
digital_literacy -0.12 -0.94, 0.69 0.8
finliteracy_plan 0.67 0.04, 1.3 0.039
finliteracy_save 0.77 0.26, 1.3 0.003
finliteracy_aware 1.9 1.2, 2.7 <0.001
R² = 0.869; Adjusted R² = 0.835; AIC = 86.3; Statistic = 25.8; p-value = <0.001; σ = 0.311
1 CI = Confidence Interval

5.2 Checking Multicollinearity of Initial Model

To assess multicollinearity, we used the ols_vif_tol() function to calculate the Variance Inflation Factor (VIF) for each variable. The VIF values for Digital Literacy and Internet Access exceeded 10, while Mobile Ownership was close to 10, indicating multicollinearity.

ols_vif_tol(finincl_mlr1)
               Variables  Tolerance       VIF
1               age16_24 0.45262188  2.209350
2               age25_34 0.52764860  1.895201
3               age35_44 0.55006196  1.817977
4               age45_54 0.55768217  1.793136
5          gender_female 0.62713154  1.594562
6          education_pri 0.34623566  2.888206
7          education_sec 0.23213969  4.307751
8          education_voc 0.48524042  2.060834
9          education_deg 0.44388697  2.252826
10         household_big 0.65467847  1.527467
11              is_rural 0.62738535  1.593917
12     employment_formal 0.44696638  2.237305
13       employment_self 0.24855663  4.023228
14 employment_unemployed 0.59664145  1.676048
15       is_agribusiness 0.45336921  2.205708
16            earned_low 0.24315704  4.112569
17            earned_med 0.19015660  5.258823
18           earned_high 0.62936446  1.588904
19     income_source_cnt 0.44484330  2.247983
20      own_mobile_phone 0.10347893  9.663803
21       access_internet 0.09916224 10.084484
22      digital_literacy 0.05039740 19.842294
23      finliteracy_plan 0.31488665  3.175746
24      finliteracy_save 0.35251710  2.836742
25     finliteracy_aware 0.36002681  2.777571

5.3 Second MLR Model

Consequently, we removed the Digital Literacy variable, as it had the highest VIF value and because Mobile Ownership and Internet Access were deemed more directly associated with financial inclusion. In the second MLR model, all explantory variables except Digital Literacy were included.

The second model achieved an adjusted R² of 0.837 and was statistically significant (p-value < .05). Likewise, we observed some variables were significant (e.g., proportion aged 25–34, proportion involved in agricultural businesses), while others were not (e.g., employment-related variables, proportion with large households).

finincl_mlr2 <- lm(fin_inclusion ~ age16_24 + age25_34 + age35_44 + age45_54 +
                    gender_female + education_pri + education_sec + education_voc +
                    education_deg + household_big + is_rural + employment_formal +
                    employment_formal + employment_self + employment_unemployed +
                    is_agribusiness + earned_low + earned_med + earned_high +
                    income_source_cnt + own_mobile_phone + access_internet +
                    finliteracy_plan + finliteracy_save + 
                    finliteracy_aware,
                data = uga_fin_sf)

tbl_regression(finincl_mlr2, intercept = FALSE) %>% 
  add_glance_source_note(label = list(sigma ~ "\U03C3"),
                         include = c(r.squared, adj.r.squared, 
                                     AIC, statistic,
                                     p.value, sigma))
Characteristic Beta 95% CI1 p-value
age16_24 0.24 -0.57, 1.0 0.6
age25_34 0.76 0.05, 1.5 0.037
age35_44 1.5 0.56, 2.3 0.002
age45_54 -0.53 -1.4, 0.35 0.2
gender_female -0.18 -0.74, 0.38 0.5
education_pri -0.03 -0.57, 0.50 0.9
education_sec 0.59 -0.14, 1.3 0.11
education_voc -0.18 -1.2, 0.89 0.7
education_deg -1.5 -3.3, 0.31 0.10
household_big 0.06 -0.31, 0.44 0.7
is_rural -0.16 -0.45, 0.13 0.3
employment_formal -0.27 -1.5, 0.97 0.7
employment_self -0.21 -0.81, 0.40 0.5
employment_unemployed -0.69 -1.9, 0.54 0.3
is_agribusiness 0.47 0.14, 0.80 0.005
earned_low 0.40 -0.24, 1.0 0.2
earned_med 1.1 0.38, 1.9 0.004
earned_high 1.4 0.23, 2.6 0.019
income_source_cnt 0.03 -0.20, 0.27 0.8
own_mobile_phone 0.71 0.02, 1.4 0.045
access_internet 0.31 -0.37, 0.98 0.4
finliteracy_plan 0.66 0.03, 1.3 0.039
finliteracy_save 0.77 0.27, 1.3 0.003
finliteracy_aware 1.9 1.2, 2.7 <0.001
R² = 0.869; Adjusted R² = 0.837; AIC = 84.5; Statistic = 27.1; p-value = <0.001; σ = 0.309
1 CI = Confidence Interval

5.4 Checking Multicollinearity of Second Model

To assess multicollinearity again. Since the VIF values for all independent variables were below 10, we conclude that there is no more multicollinearity.

ols_vif_tol(finincl_mlr2)
               Variables Tolerance      VIF
1               age16_24 0.4581549 2.182668
2               age25_34 0.5331003 1.875820
3               age35_44 0.5573649 1.794157
4               age45_54 0.5666336 1.764809
5          gender_female 0.6276282 1.593300
6          education_pri 0.3771017 2.651805
7          education_sec 0.2385751 4.191553
8          education_voc 0.4921413 2.031937
9          education_deg 0.4722443 2.117548
10         household_big 0.6572598 1.521468
11              is_rural 0.6290053 1.589812
12     employment_formal 0.4498742 2.222844
13       employment_self 0.2517605 3.972029
14 employment_unemployed 0.5981586 1.671797
15       is_agribusiness 0.4644502 2.153083
16            earned_low 0.2497320 4.004292
17            earned_med 0.1909339 5.237415
18           earned_high 0.6358058 1.572807
19     income_source_cnt 0.4474371 2.234951
20      own_mobile_phone 0.1772264 5.642500
21       access_internet 0.2388015 4.187578
22      finliteracy_plan 0.3168609 3.155959
23      finliteracy_save 0.3527430 2.834925
24     finliteracy_aware 0.3605272 2.773716

5.5 Variable Selection in Third MLR Model

We used the ols_step_forward_p() function to perform stepwise forward selection, setting a p-value threshold of 0.05 to ensure that all variables included in the final model are statistically significant.

finincl_mlr3 <- ols_step_forward_p(finincl_mlr2,
                                   p_val = 0.05,
                                   details = FALSE)

The plot below illustrates the stepwise forward selection process, showing incremental changes in Adjusted R², AIC, and RMSE at each step.

plot(finincl_mlr3)

The summary output shows that the final model includes 10 variables, achieving an adjusted R² of 0.842 and is statistically significant (p-value < .05). All variables, except for high earned income, had p-values below 0.05; high earned income was marginally significant at 0.065. The explanatory variables in the model are:

  • Proportion aged 25–34
  • Proportion aged 35–44
  • Proportion with secondary education
  • Proportion involved in agricultural businesses
  • Proportion with medium earned income
  • Proportion with high earned income
  • Proportion owning mobile phones
  • Mean financial literacy score for planning/budgeting
  • Mean financial literacy score for saving behaviors
  • Mean financial literacy score for awareness of financial products
tbl_regression(finincl_mlr3$model, intercept = FALSE) %>% 
  add_glance_source_note(label = list(sigma ~ "\U03C3"),
                         include = c(r.squared, adj.r.squared, 
                                     AIC, statistic,
                                     p.value, sigma))
Characteristic Beta 95% CI1 p-value
finliteracy_aware 1.9 1.4, 2.5 <0.001
finliteracy_save 0.95 0.50, 1.4 <0.001
earned_med 0.76 0.31, 1.2 0.001
age35_44 1.4 0.72, 2.2 <0.001
age25_34 0.92 0.35, 1.5 0.002
earned_high 0.95 -0.06, 2.0 0.065
finliteracy_plan 0.86 0.34, 1.4 0.001
education_sec 0.63 0.09, 1.2 0.022
is_agribusiness 0.48 0.21, 0.75 <0.001
own_mobile_phone 0.55 0.01, 1.1 0.045
R² = 0.855; Adjusted R² = 0.842; AIC = 68.8; Statistic = 66.1; p-value = <0.001; σ = 0.304
1 CI = Confidence Interval

5.6 Checking Assumptions of MLR Model

Although we confirmed that multicollinearity is not present in the model, we conducted additional checks to ensure that other assumptions of multiple linear regression (MLR) were not violated.

We used the ols_plot_resid_fit() function to test the assumptions of linearity and additivity in the relationships between the dependent and independent variables. The resulting plot shows data points scattered around the zero line, suggesting that the relationships between the dependent and independent variables are linear.

ols_plot_resid_fit(finincl_mlr3$model)

The plot below, created with the ols_plot_resid_hist() function, shows that the residuals approximate a normal distribution.

ols_plot_resid_hist(finincl_mlr3$model)

To statistically assess normality, we used the ols_test_normality() function. The summary table indicates that the p-values from the Anderson-Darling, Shapiro-Wilk, and Kolmogorov-Smirnov tests are all greater than the alpha level of 0.05. Therefore, we do not reject the null hypothesis and conclude that there is insufficient evidence to suggest the residuals are non-normally distributed.

ols_test_normality(finincl_mlr3$model)
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.9918         0.6887 
Kolmogorov-Smirnov        0.0617         0.7381 
Cramer-von Mises         21.9979         0.0000 
Anderson-Darling          0.3313         0.5095 
-----------------------------------------------

5.7 Geovisualisation of Residuals

To visualise the residuals of the MLR model on the map, we first extracted the residuals from the model, converted them into a dataframe using as.data.frame(), and renamed the variable for clarity. We then joined the residuals with the uga_fin_sf dataframe, renaming the variable again for easy readability.

mlr_output <- as.data.frame(finincl_mlr3$model$residuals) %>%
  rename(mlr_residuals = `finincl_mlr3$model$residuals`)

uga_fin_sf1 <- cbind(uga_fin_sf, mlr_output$mlr_residuals) %>%
  rename(mlr_residuals = `mlr_output.mlr_residuals`)

The plot indicates that certain clusters of districts have positive residuals (e.g. districts in the west of Uganda), suggesting a possible presence of spatial autocorrelation.

Code
tmap_mode("view")
tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(uga_fin_sf1) +  
  tm_dots(col = "mlr_residuals",
          alpha = 0.6,
          style="quantile",
          title = "MLR Residuals") +
  tm_view(set.zoom.limits = c(6,9))

5.8 Testing for Spatial Autocorrelation

We use sfdep package to compute the distance weight matrix. We use st_knn() on the centroids, specifying 6 neighbours for each district. We then compute the distance weights using st_weights() function.

uga_fin_sf1 <- uga_fin_sf1 %>%
  mutate(nb = st_knn(centroid, k = 6, longlat = FALSE),
         wt = st_weights(nb, style = "W"),
         .before = "geometry")

To confirm the presence of spatial autocorrelation, we perform a Global Moran’s I permutation test to determine whether spatial autocorrelation exists in the residuals.

  • H₀: The residuals are randomly distributed (spatially stationary).
  • H₁: The residuals exhibit spatial dependence (spatially non-stationary).

The Global Moran’s I test for residual spatial autocorrelation shows that it’s p-value is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed. Since the Observed Global Moran I = 0.11023 which is greater than 0, we can infer than the residuals resemble cluster distribution.

set.seed(1234)
global_moran_perm(uga_fin_sf1$mlr_residuals,
                  uga_fin_sf1$nb,
                  uga_fin_sf1$wt,
                  alternative = "two.sided",
                  nsim = 99)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.11023, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

6. Geograpically Weighted Regression (GWR) Model

6.1 Fixed-Bandwidth GWR Model

Given the presence of spatial autocorrelation, as demonstrated by the Global Moran’s I permutation test, we proceeded to enhance the final MLR model by incorporating spatial components.

We conduct the following steps to build a fixed-bandwidth GWR model.

We used the bw.gwr() function from the GWModel package, setting the adaptive argument to FALSE, to determine the optimal fixed bandwidth for the model. We use bisquare kernel which is commonly used.

Using the CV cross-validation approach, it shows that the recommended bandwidth is 724369.1 meters.

bw_fixed <- bw.gwr(fin_inclusion ~ age25_34 + age35_44 + 
                     education_sec + is_agribusiness +
                     earned_med + earned_high + own_mobile_phone +
                     finliteracy_plan + finliteracy_save + finliteracy_aware,
                   data = uga_fin_sf1,
                   approach = "CV",
                   kernel = "bisquare",
                   adaptive = FALSE,
                   longlat = FALSE)
Fixed bandwidth: 447943 CV score: 13.07855 
Fixed bandwidth: 276899.4 CV score: 14.4272 
Fixed bandwidth: 553653.8 CV score: 12.61 
Fixed bandwidth: 618986.6 CV score: 12.4853 
Fixed bandwidth: 659364.5 CV score: 12.44387 
Fixed bandwidth: 684319.5 CV score: 12.42607 
Fixed bandwidth: 699742.5 CV score: 12.41726 
Fixed bandwidth: 709274.4 CV score: 12.41252 
Fixed bandwidth: 715165.5 CV score: 12.40983 
Fixed bandwidth: 718806.3 CV score: 12.40826 
Fixed bandwidth: 721056.5 CV score: 12.40732 
Fixed bandwidth: 722447.2 CV score: 12.40675 
Fixed bandwidth: 723306.7 CV score: 12.4064 
Fixed bandwidth: 723837.9 CV score: 12.40619 
Fixed bandwidth: 724166.2 CV score: 12.40606 
Fixed bandwidth: 724369.1 CV score: 12.40598 

Using the optimal fixed bandwidth, we calibrated the GWR model with gwr.basic(). The results indicate that the AIC of the fixed-bandwidth GWR model is 68.8, which is significantly lower than that of the global MLR model (AIC = 50.4). Additionally, the GWR model achieved an adjusted R² of 0.843.

gwr_fixed <- gwr.basic(fin_inclusion ~ age25_34 + age35_44 + 
                         education_sec + is_agribusiness +
                         earned_med + earned_high + own_mobile_phone +
                         finliteracy_plan + finliteracy_save + finliteracy_aware,
                       data = uga_fin_sf1,
                       bw = bw_fixed, 
                       kernel = 'bisquare', 
                       longlat = FALSE)
gwr_fixed
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-11-11 00:35:15.421247 
   Call:
   gwr.basic(formula = fin_inclusion ~ age25_34 + age35_44 + education_sec + 
    is_agribusiness + earned_med + earned_high + own_mobile_phone + 
    finliteracy_plan + finliteracy_save + finliteracy_aware, 
    data = uga_fin_sf1, bw = bw_fixed, kernel = "bisquare", longlat = FALSE)

   Dependent (y) variable:  fin_inclusion
   Independent variables:  age25_34 age35_44 education_sec is_agribusiness earned_med earned_high own_mobile_phone finliteracy_plan finliteracy_save finliteracy_aware
   Number of data points: 123
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
     Min       1Q   Median       3Q      Max 
-0.75863 -0.24213  0.01064  0.19921  0.75298 

   Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
   (Intercept)        -0.6543     0.1976  -3.312  0.00125 ** 
   age25_34            0.9206     0.2863   3.215  0.00170 ** 
   age35_44            1.4474     0.3687   3.925  0.00015 ***
   education_sec       0.6266     0.2690   2.330  0.02162 *  
   is_agribusiness     0.4839     0.1358   3.563  0.00054 ***
   earned_med          0.7636     0.2312   3.302  0.00129 ** 
   earned_high         0.9467     0.5078   1.864  0.06489 .  
   own_mobile_phone    0.5519     0.2721   2.028  0.04494 *  
   finliteracy_plan    0.8575     0.2607   3.289  0.00134 ** 
   finliteracy_save    0.9507     0.2271   4.186 5.68e-05 ***
   finliteracy_aware   1.9384     0.2811   6.897 3.33e-10 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 0.3043 on 112 degrees of freedom
   Multiple R-squared: 0.8551
   Adjusted R-squared: 0.8422 
   F-statistic: 66.11 on 10 and 112 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 10.36825
   Sigma(hat): 0.2927251
   AIC:  68.82621
   AICc:  71.66257
   BIC:  37.31863
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: bisquare 
   Fixed bandwidth: 724369.1 
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                          Min.   1st Qu.    Median   3rd Qu.    Max.
   Intercept         -0.683016 -0.632200 -0.605856 -0.592403 -0.5261
   age25_34           0.668388  0.794241  0.878266  0.956548  1.0412
   age35_44           1.106205  1.292176  1.405400  1.567049  1.8719
   education_sec      0.421038  0.520655  0.610601  0.700123  0.8134
   is_agribusiness    0.395924  0.432783  0.448846  0.475894  0.5310
   earned_med         0.551214  0.710599  0.758881  0.826400  0.9363
   earned_high        0.520001  0.703192  0.825605  0.963703  1.2191
   own_mobile_phone   0.071497  0.427809  0.615259  0.702556  0.9120
   finliteracy_plan   0.830708  0.845049  0.859533  0.876469  0.9298
   finliteracy_save   0.761426  0.885904  0.927344  0.957869  1.0618
   finliteracy_aware  1.834788  1.931056  2.005183  2.045504  2.1275
   ************************Diagnostic information*************************
   Number of data points: 123 
   Effective number of parameters (2trace(S) - trace(S'S)): 16.87521 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 106.1248 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 71.45903 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 50.40986 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): -17.90227 
   Residual sum of squares: 9.655359 
   R-square value:  0.865082 
   Adjusted R-square value:  0.8434243 

   ***********************************************************************
   Program stops at: 2024-11-11 00:35:15.431977 

Since certain functions and outputs, such as local p-values of coefficients, are not available for sf objects, we need to build the same fixed-bandwidth GWR model using a Spatial object instead.

Following the GWmodel authors’ methodology, we convert the sf object to a Spatial object, then calibrate the GWR model as before. We use gwr.t.adjust() to compute and adjust the p-values, helping to reduce the risk of type I errors.

uga_fin_sp <- as(uga_fin_sf1, "Spatial")

gwr_fixed_sp <- gwr.basic(fin_inclusion ~ age25_34 + age35_44 + 
                         education_sec + is_agribusiness +
                         earned_med + earned_high + own_mobile_phone +
                         finliteracy_plan + finliteracy_save + finliteracy_aware,
                       data = uga_fin_sp,
                       bw = bw_fixed, 
                       kernel = 'bisquare', 
                       longlat = FALSE)

gwr_fixed_tadj <- gwr.t.adjust(gwr_fixed_sp)

6.2 Consolidating GWR Model Output

In this step, we consolidate outputs from our GWR models to enable visualisation and further examination.

First, we retrieve the SDF output from the fixed-bandwidth GWR model (sf object version), drop the geometry using st_drop_geometry(), convert it to a tibble with as_tibble(), and remove unnecessary columns.

Next, we extract the Bonferroni-corrected local coefficient p-values from the Spatial object version of the GWR model. These output p-values are binary (0 or 1), where 1 indicates statistical significance. We created a function, sig_recode, to recode values of 1 as “significant” and values of 0 as “non-significant.”

Finally, we combined these two outputs into the uga_fin_sf1 dataframe.

gwr_fixed_output <- gwr_fixed$SDF %>% 
  st_drop_geometry() %>%
  as_tibble() %>%
  select(-c(2:11))

sig_recode <- function(x) {
  factor(if_else(x == 1, "Significant", "Non-significant"), levels = c("Non-significant","Significant"))
}

gwr_bonferroni_sig <- gwr_fixed_tadj$SDF %>%
  as_tibble() %>% 
  select(contains("_bo")) %>%
  mutate_all(sig_recode)

gwr_sf_fixed <- cbind(uga_fin_sf1, gwr_fixed_output, gwr_bonferroni_sig)

6.3 Visualising Local R2

The map of local R² values is shown below. The plot indicates that R² values are higher in eastern Uganda and decrease toward the west, with local R² ranging from 0.817 to 0.892.

tmap_mode("view")
tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "Local_R2",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(6,9))

6.4 Visualising Residuals

The map below displays the GWR residuals, which range from -0.759 to 0.753. These residuals are relatively small compared to the financial inclusion score range of 0 to 7.

tmap_mode("view")
tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "mlr_residuals",
          alpha = 0.6,
          style ="quantile",
          title = "Residuals") +
  tm_view(set.zoom.limits = c(6,9))

6.5 Visualising Coefficients SE and p-value

The plot shows that the p-value for the Age (25–34) variable is statistically significant across all districts. However, the standard error varies geographically, with smaller errors at the center of Uganda and increasing errors moving outward from the center.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "age25_34_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "age25_34_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

The plot shows that the p-value for the Age (35–44) variable is statistically significant in most districts, except for those in western Uganda, where it is not significant. The standard error also varies geographically, with smaller errors at the center of Uganda and increasing errors toward the outer regions.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "age35_44_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "age35_44_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

The plot shows that the p-value for the Education (Sec) variable is statistically significant across all districts. However, the standard error varies geographically, with smaller errors at the center of Uganda and increasing errors moving outward from the center.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "education_sec_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "education_sec_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

The plot shows that the p-value for the Agricultural Business variable is statistically significant across all districts. However, the standard error varies geographically, with smaller errors at the center of Uganda and increasing errors moving outward from the center.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "is_agribusiness_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "is_agribusiness_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

The plot shows that the p-value for the Earned Income (Medium) variable is statistically significant across all districts. However, the standard error varies geographically, with smaller errors at the center of Uganda and increasing errors moving outward from the center.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "earned_med_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "earned_med_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

We observed similar pattern for Earned Income (High) variable as well. However, we noticed that the range of standard error is higher for this variable.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "earned_high_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "earned_high_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

We observed similar pattern for Own Mobile Phone variable as well.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "own_mobile_phone_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "own_mobile_phone_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

We observed similar pattern for Financial Literacy (Planning & Budgeting) variable as well.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "finliteracy_plan_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "finliteracy_plan_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

The plot shows that the p-value for the Financial Literacy (Saving Behaviour) variable is statistically significant in most districts, except for a few in southern Uganda, where it is not significant. The standard error also varies geographically, with smaller errors at the center of Uganda and increasing errors toward the outer regions.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "finliteracy_save_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "finliteracy_save_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

The plot shows that the p-value for the Financial Literacy (Awareness of Financial Products) variable is non-statistically significant in all the districts.

Code
tmap_mode("view")
SE <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "finliteracy_aware_SE",
          alpha = 0.6,
          style="quantile",
          title = "Std Error") +
  tm_view(set.zoom.limits = c(6,9))

pval <- tm_shape(uga_fin_sf1) +
  tm_polygons(alpha = 0.4) +
  tm_shape(gwr_sf_fixed) +  
  tm_dots(col = "finliteracy_aware_p_bo",
          alpha = 0.6,
          palette = c("lightgrey", "red"),
          title = "Significance") +
  tm_view(set.zoom.limits = c(6,9))

tmap_arrange(SE, pval, asp=1, ncol=2, sync = TRUE)

7. Concluding Remarks

This study employed multiple linear regression (MLR) and geographically weighted regression (GWR) to identify factors influencing financial inclusion at the district level in Uganda. Using stepwise forward selection, the following variables were found to explain financial inclusion:

  • Proportion aged 25–34
  • Proportion aged 35–44
  • Proportion with secondary education
  • Proportion involved in agricultural businesses
  • Proportion with medium earned income
  • Proportion with high earned income
  • Proportion owning mobile phones
  • Mean financial literacy score for planning/budgeting
  • Mean financial literacy score for saving behaviors
  • Mean financial literacy score for awareness of financial products

The Global Moran’s I permutation test confirmed the presence of spatial autocorrelation, indicating geographical variation in the factors explaining financial inclusion. In the GWR model, all MLR variables, except for financial literacy on awareness of financial products, remained significant predictors. Notably, some variables, such as the proportion aged 35–44 and financial literacy on saving behaviors, were not significant in certain districts, highlighting the importance of accounting for local variations.

This study demonstrates the value of geographically weighted regression in capturing spatial nuances within an explanatory model of financial inclusion. However, a limitation lies in the ambiguity regarding causation; it remains unclear whether higher financial literacy is a result of greater access to financial services within certain districts, or if these factors are mutually reinforcing. Future research could further explore these relationships to better understand the directionality and potential interactions among these predictors.

In conclusion, this study underscores the complexity of financial inclusion and the need for geographically tailored approaches in policy-making to address district-specific needs across Uganda.

Reference